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(57) Abstract 

Genetic mapping is provided by either of two related general procedures. In the first procedure, mapping is provided by 
identifying genetic regions from which DNA fragments derived from two individuals can combine to form extensive hybrids free 
of base mismatches. DNA is processed by a method that allows perfectly-matched hybrid DNA molecules formed between 
DNAs from the two individuals, to be separated from imperfectly-paired DNA hybrids or hybrids in which both strands are from 
the same individual's DNA. The perfectly-matched hybrid DNAs can then be labeled and the labeled DNA used as probes to 
identify loci of indentity-by-descent between the two individuals. In the second prodecure, nicks are introduced specifically into 
DNA hybrids formed between non-identical alleles from a region of heterozygosity in an individual diploid genome. The nicked 
DNA molecules are then specifically labeled to provide probes for identifying regions of heterozygosity in the genome of an indi- 
vidual. 
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GENOMIC MISMATCH SCANNING 



CROSS-REFERENCE TO GOVERNMENT GRANT 
This invention was made with Government support under 
contract HG00450 awarded by the National Institutes of 
5 Health. The government may have certain rights in this 
invention. 

CROSS-REFERENCE TO RELATED APPLICATIONS 
This application is a continuation-in-part of 
application serial no. 880,167 filed on May 6, 1992. 

10 INTRODUCTION 
Technical Field 

The field of this invention is genetic mapping. 



Background 

Linkage mapping of genes involved in disease 
15 susceptibility and other traits in humans, animals and 
plants has in recent years become one of the most 
iiiipox'tant engines of progress i.n bj-ology and mcdiCj-nc . 
The development of polymorphic DNA markers as landmarks 
for linkage mapping has been a major factor in this 
20 advance. However, current methods that rely on these 
markers for linkage mapping in humans are laborious, 
allowing screening of only at most a few markers at a 
time. Furthermore, their power is limited by the sparsity 
of highly- informative markers in many parts of the human 
25 genome. 
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To map genes whose manifestations are recognized only 
in the whole organism, the standard approach relies on 
identifying linkage between the trait and a genetic marker 
whose map position is already known. The most abundant 
5 and generally useful class of markers in the human genome 
are DNA sequence polymorphisms— either restriction fragment 
length polymorphisms (RFLP's) or other DNA polymorphisms 
that can be detected by hybridization with specific probes 
or by amplification using specific primers. 

3_0 RFLP mapping and related methods have had dramatic 

successes. However, their utility is limited by two 
problems: First, many discrete loci need to be examined, 
but only one or a few loci can be typed at a time, which 
procedure is arduous; second, the low density on the map 

15 and low polymorphism information content (PIC value) of 
available markers means that multiple members of each 
family need to be typed to obtain useful linkage 
information, even when only two share the trait of 
interest. While these disadvantages can be overcome to 

20 some degree by automation and technical improvements, as 
well as by developing closely-spaced extremely polymorphic 
markers, the use of discrete markers for specific map 
intervals has inherent limitations. The limitations are 
particularly marked when applied to mapping strategies 

25 that seek to reduce the number of individuals that need to 
be analyzed in each family. 

It is generally much easier to collect many pairs of 
related individuals who share a trait of interest than to 
collect a few large, well -documented pedigrees in which 



. 1 1 1 ^ J *~ i V> -F -v-m ^ >- r^-^aa t-Vio 

absolute number of individuals is smaller. For medically - 
significajit traits, affected individuals are likely to 
present themselves for examination, whereas other family 
members need to be traced and recruited. Furthermore, 
3 5 individuals vital to pedigree analysis may be deceased. 
Yet the low density and low information content of 
available markers makes the use of pedigrees almost 
mandatory for linkage mapping by RFLP analysis. 



wo 93/22462 PCT/US93/04160 



Collection o^appropriate families frequently poses the 
principal barrier to mapping genes that influence human 
traits, particularly genetically complex traits. 
Strategies have been reported for linkage mapping using 
5 the information in DNA from multiple very small sets of 
affected relatives (typically pairs or even single 
individuals) . However, these strategies depend upon the 
availability of closely-spaced highly information genetic 
markers throughout the genetic map. 

10 The issues raised above in reference to linkage 

mapping also apply to genetic risk assessment in medicine. 

In principle, any base that differs among allelic 
sequences could serve as a marker for linkage analysis. 
Single-base differences between allelic single copy 

15 sequences from two different haploid genomes have been 
estimated to occur about once per 3 00 bp in an outbred 
, VJestern European population. This calculates to a total 
of about lO"^ potential markers for linkage analysis per 
haploid genome. Only a tiny fraction of these nucleotide 

20 differences contribute to mapping using current methods. 
There is, therefore, substantial interest in developing 
new methods that utilize the available genomic information 
more efficiently and can provide information concerning 
mult i -gene traits. Such methods could be valuable, not 

25 only for gene mapping, but also for genetic diagnosis and 
risk assessment. 

Relevant Literature 

Articles describing the use of RFLP's are described 

±IL CUUtot-Crxil , c= i- ci-i. - , vj-.'w^/ ' ^ " 

30 Donis-Keller, et al . (1988) Cell 51:319-337; Kidd, et al . 
(1989) Cvtoaener- Cell. Genet. 51:622-947 and Risch (1990) 
Am. J. Hum. Genet. 46:242-253. Mapping strategies may be 
found in Risch (1990) Am. J. Hu m. Genet. 46:229-241; 
Lander and Botstein (1987) Science 236:1567-1570; and 

3 5 Bishop and Williamson (1990) Am. J. Hum. Genet. 

46:254-265. Sandra and Ford, (1986) Nucleic Acids Res. 
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14:7265-7282 and Casna, et ai . (1986) Nucleic Acids Res. 
14:7285-7303 describe genomic analysis, 

SUMMARY OF THE INVENTION 
Genomic analysis is achieved through the process of: 
5 Digesting DNA to be compared from two different sources, 
usually individuals who are genetically related or 
suspected of being genetically related, with a restriction 
enzyme that cuts relatively infrequently; combining single 
strands of the genomic fragments from the two individuals - 

10 under conditions whereby heterohybrids (hybrids containing 
one strand from each individual) can be distinguished from 
homohybrids (hybrids containing both strands from the same 
individual); separating homohybrids from heterohybrids; 
separating mismatch- free heterohybrids from hybrids with 

15 mismatches; preparing labeled probes from the mismatch- 
free heterohybrids; and identifying regions of genetic 
identity between the two individuals by means of said 
labeled probes. The mismatch- free heterohybrids provide 
highly specific probes for regions of genetic identity by 

20 descent, since sufficiently large hybrid DNA molecules 

formed from non- identical regions are expected to have at 
least one and usually many base mismatches. 

Alternatively, one may map regions of heterozygosity, 
and, by inference, homozygosity in a single individual by 

25 isolating DNA fragments siibstantially free of multicopy 

DNA; melting the DNA and reannealing to provide for hybrid 
fragments; introducing nicks specifically in mismatched 
DNA; labeling the nicked DNA; and using the labeled DNA as 

pj-UJJtita t_i^ X(-Lc:xi.i--LJ- J j^t^-yj-w**^ wj_ 4.4.V- »- w*- ^ 3, j - _j 

3 0 homozygosity or hemizygosity (where all or a significant 
portion of a chromosome is missing, e.g. aneuploidy) are 
inferred by the absence of hybridized label. 

DESCRIPTION OF THE SPECIFIC EMBODIMENTS 
Methods and compositions are provided for genomic 
3 5 mapping by identifying regions of the genome at which DNA 
sequences from two DNA sources are perfectly identical 
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over long stitches (typically 10^ to 2 x 10 nt ) . 
Depending upon the nature of the probe, the procedures may 
vary. In a procedure that allows for labeling of regions 
of genetic identity between two individuals, DNA sources 
5 are digested with a restriction enzyme that cuts 

infrequently; the resulting DNA is processed to isolate 
mismatch- free heterohybrid dsDNA fragments free from 
mismatch containing or homohybrid fragments; the mismatch- 
free heterohybrid fragments are labeled and then used to 
10 identify regions of genetic identity between the two 
sources . 

Alternatively, to map regions of homozygosity or 
heterozygosity in an individual genome, one may digest the 
DNA from that individual; optionally, remove multi-copy 

15 DNA; melt and reanneal the DNA; introduce nicks into 

mismatched dsDNA; label the mismatched dsDNA; and use the 
labeled DNA as probes for identifying genomic regions that 
are heterozygous, leaving regions that are homozygous or 
hemizygous having substantially lower labeling. 

2 0 The DNA source may be any source, haploid to 

polyploid genomes, normally eukaryotic, and may include 
vertebrates and invertebrates, including plants and 
animals, particularly mammals, e.g. humans. The DNA will 
be of high complexity where each of the sources will 

2 5 usually have greater than 5 x 10^ bp, usually greater than 
10^ bp, more usually greater than about lo"^ bp. Thus, in 
any situation where one wishes to compare two sources of 
DNA as to their genetic similarities, whether the sources 
are related or not, the subject method may be employed. 
r, TT IT,, +-via erMi>-r>oo wi 1 T "related, beina of the same 

ju u a Lid .1. , k-*Aw — — — - — 

species, and may be more closely related in having a 
common ancestor not further away than six, frequently 
four, generations . 

For linkage mapping or genetic diagnosis, genetically 
35 related individuals are required. Thus, the subject 

method may find application in following segregation of 
traits associated with breeding of plants and animals, the 
association of particular regions in the genomic map with 
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particular traits, especially traits associated with 
multiple genes, the transmission of traits from ancestors 
or parents to progeny, the interaction of genes from 
different loci as related to a particular trait, and the 
5 like. While only two sources may be involved in the 

comparison, a much larger sampling may be involved, such 
as 2 0 or more sources, where paiirwise comparisons may be 
made between the various sources. Relationships between 
the various sources may vary widely, e.gr. grandparents and 
10 grandchildren; siblings; cousins; and the like. 

Depending upon whether the regions of genetic 
identity or regions of non- identity are to be labeled, the 
treatment of the DNA from the source will vary. The DNA 
may be processed initially in accordance with conventional 
15 ways,, lysing cells, removing cellular debris, separating 
the DNA from proteins, lipids or other components present 
. in the mixture and then using the isolated DNA for 
restriction enzyme digestion. See Molecular Cloning, A 
Laboratory Manual, 2nd ed. (eds. Sambrook et al . ) CSH 
20 Laboratory Press, Cold Spring Harbor, NY 1989. Usually, 
at least about 0.5 fig of DNA will be employed, more 
usually at least about 5 /ig of DNA, while less than 50 /xg 
of DNA will usually be sufficient. 

The following procedure will address solely the 
25 methodology employed for isolating and labeling of DNA 
corresponding to regions of identity-by-descent between 
two related individuals. The total DNA from both cellular 
sources is digested completely with a restriction enzyme 
that cuts relatively infrequently, generally providing 
30 tragmenns or scjrciiiua uj- duuLiu u.^-j-u-%j-w w^w^^-w-jr 

about 0.5-2x10^ nt . The size is selected to substantially 
ensure the presence of at least one GATC sequence, and at 
least one base difference between any allelic fragments 
not identical by descent i.e. to ensure that homohybrid 
3 5 fragments, or heterohybrid fragments that are not 

identical by descent, sustain at least one cut in a 
subsequent step. This enzyme will normally recognize at 
least a 6 -nucleotide consensus sequence and may involve 
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either blunt^?nded or staggered-ended cuts. For the 
method described in detail here, an enzyme that cuts to 
leave a protruding 3' end is needed. The protruding 3' 
ends are preferred specifically when exonuclease III 
5 digestion, followed by BNDC binding, is ultimately used to 
eliminate homohybrids and mismatched DNA' s . Restriction 
enzymes yielding other sorts of ends may be preferred if 
other specific steps are substituted, as described further 



provide for a means of separating complementary DNA 
hybrids, where the two strands are from different sources, 
from complementary DNA hybrids where the two strands are 
from the same source. .This may be achieved in different 
15 ways. The method exemplified in the subject invention 

uses the following steps. DNA from one of the sources is 
-methylated with a sequence specific methylase, such as Dam 
methylase or a restriction methylase, so as to 
substantially completely methylate the consensus sequences 

2 0 of the DNA from one of the sources. The other source is 

left unmodified or methylated completely with a different 
restriction methylase. 

The two DNA samples are then mixed, denatured and 
allowed to reanneal . A practical rate of complete 
25 annealing of the complex DNA samples can be achieved by 
using chemical or protein catalysts under conditions that 
preserve large DNA strands (Casna, et al . (1986) Nucleic 
Acids Res. 14:7285-7303; Barr and Emanuel (1990) Anal . 
Bioch. 186:369-373; Anasino (1986) ibid 152:304-307). It 

3 0 is also necessary to avoid or minimize network formation 

resulting from rapid hybridization of non-allelic repeated 
sequences, so that simple fully-duplex products can be 
recovered even when dispersed repetitive sequences are 
embedded in them. Annealing conditions that have been 
3 5 shown to meet this requirement are FPERT conditions as 
described in Casna (1986) supra . 

The reannealed DNA mixture is digested with two 
methylation-sensitive restriction endonucleases . Hybrids 



below . 



10 



The resulting DNA fragments are then processed to 
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formed between the two different DNA specxmens wxll be 

hemimethylated at the methylation sites. The restriction 
enzymes are selected so as to digest sites which are 
unmethylated or doubly methylated, while being incapable 
5 of cleaving the hemimethylated sites. Desirably, the 

sequence will be a relatively common sequence, generally 
occurring on the average of about 1-10x10^, preferably 
about 1-5x10^, A more stringent selection for mixed-donor 
hybrids can be achieved by using additional combinations 
10 of restriction/modification enzymes (Casna (1986) supra) . • 
Optionally, one may remove unannealed DNA (single stranded 
DNA) at this time using any convenient method, e.g. 
adsorbtion to BNDC. 

Various combinations of enzymes can be employed. The 
15 combination should ensure that there is at least one cut 
in any homohybrid fragment, preferably at least two or 
.more, so the sequence should be relatively common. For 
example, coli dam methylase, which recognizes GATC for 
methylation, may be employed for methylation. This 
20 modification enzyme may then be used with the methylat ion- 
sensitive restriction endonucleases Dpn l and Mbol, which 
cleave at GATC sites, the former at doubly-methylated 
sites and the latter at unmethylated sites. The 
particular sequence "GATC" is found every few hundred bp 
25 in the human genome. While a single 

restriction/modification site is preferable, two or three 
sites may be involved where different combinations of 
modification and restriction enzymes are employed- With 
DNA from sources other than human, other combinations of 

moaxricax:iUii ^n^ym^a cxx^^j. ^^^^..^ ^v^^j-w*^ w**-^ — ^ — 

employed. 

An alternative procedure is as follows. By cutting 
the two DNA samples using a different restriction enzyme 
for each-specif ically two enzymes that share a common 
35 recognition sequence but in one case cut with an N-base 3' 
overhang, and in the other with an N-base 5*^ overhang-only 
the heterohybrids will have flush ends. An example of 
such a pair of restriction enzymes for the case N=4, is 
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ACC65I and I^R . The hybrids with both~tranas aerivea 
from zhB same DNA sample will retain either a 5' or 3 ' 
overhanging end. 

The flush-ended heterohybrids will uniquely be able 
5 to be ligated to a flush-ended partner, such as an 

oligonucleotide. This oligonucleotide might, for example, 
have a hairpin structure, such that the "capped" ends are 
protected from exonuclease digestion. Variations on this 
principle are possible, exploiting the distinctive 
10 structures of the ends of heterohybrids compared with 
homohybrids, when related restriction enzyme pairs are 
used. 

This strategy for selecting heterohybrids can replace 
the Mbol/ Dpn l digestion step in selecting heterohybrids. 
15 However, a 5' to 3 ' exonuclease, such as bacteriophage 
lambda exonuclease, or a combination of a helicase and 
exonuclease VII and/or I, or another combination of 
enzymes to allow digestion of all uncapped ends would need 
to be used in addition to, or in place of exonuclease III, 
20 following the MutHLS nicking step in the procedure 
outlined below. 

Other techniques may also be used for the initial 
separation of homohybrids from heterohybrids. By growing 
the cells from one of the sources with heavy isotope- 
25 labeled nucleosides, heavy atom labeled nucleosides, or 

other isotope labeled precursors, the two strands from the 
different sources will differ in density. Isotopes that 
may be used include ^^N, ^^C, ^H, etc. The duplexes may 
then be separated by density banding. 
3 0 Alternatively., one may label the DNA from the two 

sources with different labels, e.g. using labeled 
nucleotides and terminal deoxynucleotidyl transferase, by 
random conjugation, and the like. One can then separate 
all of the duplexes as to one label, and then divide that 
3 5 group into homo- and heterolabeled duplexes. For example, 
biotin and avidin may be used to separate at one stage, 
where the avidin is bound to magnetic beads, and 
2,4-dinitrophenyl and anti- (2 , 4-dinitrophenyl) may then be 
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used for the second separation, where the anti- 
(2 , 4-dinitrophenyl) is bound to a support. 

Alternatively, one may select restriction enzymes 
which provide for overhangs, where heterohybrids will 
5 result in blunt ends or overhangs that differ from those 
of the homohybrids. One may then use the overhangs to 
separate homohybrids leaving the heterohybrids. 

Returning to the description of a principal 
embodiment, the resulting mixture of uncut heterohybrids, 
10 and Mbol or Dpnl cut homohybrids is then subjected to a 
system which allows for separation of DNA duplexes with 
some mismatched base pairs from complementary perfectly- 
matched DNA duplexes. See, for example, Lahue, et al . 
(1989) Science 245:160-164; Su and Modrich (1986) Proc . 
15 Natl. Acad. Sci, USA 83:5057-5061; Grilley, et al. (1989) 
.T. Biol. Chem. 264:1000-1004; Su, et al. (1989) Genome 
-31:104-111; and Learn and Grafstrom (1989) J. Bacterid. 

171:6473-6481. 

Illustrative of such a system is the "methyl -directed 

20 mismatch repair" pathway of coli. The system uses the 
purified mut S, mut L. mutH and uvrD gene products, as well 
as exonuclease I, exonuclease VII or RecJ exonuclease, 
single strand binding (SSB) protein and DNA 
polymerase III. In vitro, seven of the eight possible 

25 single base mismatches, as well as small insertions and 
deletions are efficiently recognized. When DNA synthesis 
is prevented, the system specifically introduces large 
gaps in the mismatch- containing molecules. The purified 
mutS, mutL and mutH proteins act in concert to introduce 

3U niCJts t3pt;uxj--LK^ci-i-j-jr J--^*'-^ ..tw—ww— 

mismatches. With the exception of C-C mismatch, the other 
mismatches are effectively identified by one or more of 
the MutX enzymes (X indicates S, L or H) . 

Exonuclease III can initiate exonucleolytic digestion 
35 at a nick and digest the nicked strand in the 3" to S'' 
direction to produce a gap. Exonuclease III can also 
initiate digestion at a recessed or flush 3' end, but it 
cannot initiate digestion at a protruding 3' end of a 
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linear duplex. Digestion from the ends of the linear DNA 
hybrids can, therefore, be prevented by choosing a 
restriction enzyme that produces protruding 3' ends for 
the initial digestion of the genomic DNA (Henikoiff (1984) 
5 Gene 28:3 51-360) . The DNA ends produced by the 

restriction enzymes used to make the smaller fragments for 
distinguishing hemimethylated sites from unmethylated or 
dimethylated sites are selected to provide recessed (e.g. 
Mbol) or flush (Dpnl) 3' termini and these termini are 
10 susceptible to digestion by exonuclease III. Therefore, 
exonuclease III provides for partial single strands in 
hybrid molecules where both strands are derived from the 
same individual or where the strands contain base 
mismatches . 

15 In carrying out the process, the duplexes obtained 

from annealing and restriction digestion are exposed to 
. the methyl -directed mismatched repair system in vitro. 
The mutS, mut L and mut H introduce nicks specifically into 
the mismatched DNA molecules, while the exonuclease III 

20 can introduce gaps at nicked sites and at recessed 3' 

termini. The mismatch- free heterohybrids , corresponding 
to regions of identity-by-descent between the sources, can 
now be distinguished from all other duplexes by virtue of 
the absence of significant gaps or partial single strand 

25 regions. 

The partially single stranded and single stranded DNA 
is now separated from the fully-duplex DNA, This can be 
efficiently achieved using benzoylated naphthylated DEAE 
cellulose (BNDC) . At high salt concentration, BNDC 

-) n ^^4--,;^r, i^in^le stranded and ""^artially si ngle- stranded DNA 
molecules with high efficiency and may be separated by 
centrif ugation or other separation means. The unbound DNA 
molecules are recovered, which in this case comprise 
complementary sequences from the two sources . 

3 5 A more complete methyl -directed mismatch repair 

enzyme system may ultimately prove to be superior in 
specificity to the simple system using only MutS, MutL and 
MutH. For example, MutL, MutS and MutH, plus helicase II 
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{UvrD proteinTT exonuclease I. exonuclease VII, sxngle- 
strand binding protein, and DNA polymerase III, acting in 
concert, can carry out mismatch dependent DNA synthesis, 
and thereby specifically introduce labeled or modified 
5 nucleotides into mismatch- containing DNA molecules. For 
example, by using biotinylated nucleotides, mismatched DNA 
molecules can be specifically biotinylated, and then 
immobilized avidin or related methods can be used to 
remove the mismatched molecules from a mixture of 
10 perfectly-matched and mismatched DNA molecules. 

Alternatively, use of the same set of enzymes, excluding 
DNA polymerase, would lead directly to gaps in the 
mismatched DNA eliminating the exonuclease III step. For 
such a procedure, the ends of the linear DNA molecules 
15 produced in the initial annealing step would need to be 
protected from the action of the exonucleases and 
helicase. This procedure would thus interface well with 
the end-capping method suggested above as an alternative 
method for selection for heterohybrid molecules. 
20 It is fairly obvious that related enzyme systems 

based on mismatch repair protein from coli or other 
organisms could in principle substitute here for the 
particular enzyme system described. 

The mismatch-free heterohybrid dsDNA duplexes may be 
25 used without expansion. The subject method provides a 
sufficiently clean separation of the mismatch- free 
heterohybrid dsDNA duplexes in sufficient amount to allow 
for their labeling and direct use as probes. By using a 
readily available amount of DNA which can be efficiently 

1 . ,11. r -P^^rry =. K^i 1 1- H ^ IDO LtQ DNA. USUal Iv 

about 1 to 10 /xg DNA, from each source, a satisfactory 
amount of the mismatch- free heterohybrid dsDNA duplexes 
are obtained for labeling and probing a DNA sample. 

The mismatch- free heterohybrid dsDNA sequences from 
35 the two sources can be used for identifying the regions of 
identity-by-descent between the two sources. 
Conveniently, the dsDNA may be labeled for use as a probe 
to identify the corresponding genomic regions. A wide 
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Dels may be employed, particularly raaio- 
isotopes, fluorescers, enzymes, and the like. The 
particular choice of label will depend upon the desired 
sensitivity, the nature of the genomic sample being 
5 probed, the sensitivity of detection required, and the 
like. Thus, the more complex the genome under analysis, 
the higher the sensitivity which would be desired. 
Various instruments are available which allow for 
detection of radioactivity, fluorescence, and the like. 
10 The probes may be prepared by any convenient 

methodology, such as nick translation, random hexamer 
primed labeling, polymerase chain reaction using primers 
that prime outward from dispersed repetitive sequences or 
random sequences, and the like. 
15 The DNA that is probed may take a variety of forms, 

but essentially consists of a physically-ordered array of 
DNA sequences that can be related back to the physical 
arrangement of the corresponding sequences in the genome 
(See Boyle, et ai . (1990) Genomics 7:127-130; Penkel, 
20 et al. (1988) PNAS USA 87:6634-6638). A metaphase 

chromosome spread is one naturally-occurring example of 
such an array. Alternatively, and preferably, a partial 
or a complete collection of cloned, amplified, or 
synthetic DNA sequences corresponding to known genetic 
25 locations, immobilized in an ordered array on a solid 

substrate such as a membrane or a silicon or plastic chip, 
can be used as the target for probing. 

The hybridizations with the probes are performed 
under conditions that allow the use of complex mixtures of 
3 0 DNA probes and suppress art if actual hybridization to 
repeated sequences (Boyle, et al . (1990) Genomics 
7:127-133; Pinkel , et al . (1988) Proc . Na tl. Acad. Sci . 
USA 85:9138-9142; Lichter, et al . (1990) ibid 
87:6634-6638). For example, in the case of grandparent- 
3 5 grandchild pairs, hybridization should occur in 

approximately 2 5 large patches (averaging about 4 ^ in 
length when prometaphase chromosomes are used) , which in 
aggregate should cover about one-half of the genome. 
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«• 
tries of the patches will be determxned by 

the ends of chromosomes and sites of meiotic crossing 
over. The boundaries will reflect sites of meiotic' 
recombination that occurred in meioses intervening between 
5 the two relatives. Since the areas of hybridization will 
typically be in large contiguous patches, the method is 
very robust with respect to contamination of the probe 
with sequences representing regions that are not identical 
between the two subjects. Even if only a modest 
10 enrichment of identical -by- descent sequences is achieved, ^ 
the patches of identity and non- identity should be 
distinguishable as contiguous blocks of greater or lower 
signal intensity. 

Alternatively, rather than using the selected 
15 mismatch- free, heterohybrid restriction fragments as 
probes, they themselves can be immobilized on a solid 
. substrate, typically a Southern blot performed after 
resolving the DNA restriction fragments by gel 
electrophoresis. The immobilized fragments can then be 
2 0 probed by hybridization using labelled DNA probes specific 
to regions of interest. The presence or absence of 
hybridizing bands recovered from a specific pairwise 
comparison will indicate whether the pair has identity by 
descent at the locus in question. This procedure is 
25 likely to be useful for refining the resolution of a map, 
after initial mapping is achieved by using selected 
fragments as probes . 

Genetic regions identified by this mapping method may 
be cloned by standard methods or in some cases by direct 
r^ _T r^/^z-mcarnr^fiko eolor^i-<aH "h-LT (T^a-nnmir! mismatch 

O-Lt^lX-Lli-y l-XAV=, ^W*^W.W*-twww w- ~j ^ 

scanning. These sequences can then be analyzed for their 
biological function, and in some cases used directly or in 
synthetic or modified form for diagnostic or therapeutic 
applications . 

3 5 When the region of genetic identity between two or 

more individuals is sufficiently small, for example, in 
plant breeding when products of serial backcrosses with 
selection for a useful trait are compared, it may be 
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useful to clone the selected identical -by-descent 
restriction fragments, since the resulting clone pool 
would be highly enriched for the desired gene sequence 
(responsible for the selected trait) . 
5 An alternate embodiment of genetic mapping looks to 

differences within an individual genome. One may look to 
identify regions of a genetic map where an individual or a 
sample is or is not heterozygous . This method is 
predicated on the isolation of single-copy sequences 
10 corresponding to regions of heterozygosity in the test 

individual, based on the ability of that individual's DNA 
to give rise to hybrid DNA molecules with base mismatches 
when the individual's DNA is denatured and reannealed. 
The selected mismatched hybrid sequences are then labeled 
15 and used as probes for hybridization to a physically 

ordered array of a genomic DNA sample. Because regions of 
. the genome that lack heterozygosity are unable to produce 
single -copy DNA hybrids with base mismatches, these 
regions are visualized as gaps in the hybridization 
2 0 pattern. The following is an exemplary protocol. 

The DNA sample is digested to completion with a 
restriction enzyme that cuts the DNA frequently enough 
that most of the resulting fragments should not contain 
repetitive sequences. Illustrative restriction enzymes 
25 include enzymes having four nucleotide consensus 

sequences, such as Alul , Rsal , TaqI, Haelll and Mbol . The 
resulting fragments will for the most part be in the range 
of about 200-4000 nucleotides. After melting or 
denaturing the DNA, the DNA is allowed to reanneal in 
-) rv T.TViiaT-o +->^o T-ar^-iHlv ■rR;5nneal inQ multi-coTOV or 

j^epeated DNA sequences (low C^-^t) are removed, for example, 
by hydroxyapatite chromatography, based on their rapid 
reannealing, followed by allowing the remaining DNA to 
anneal completely. Removal of the low C^t number DNA is 
3 5 desirable, but may not be essential. After complete 

reannealing, desirably removing residual unannealed DNA, 
the small, mostly single-copy DNA fragments that remain 
are incubated with the methyl-directed mismatch repair 
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proteins descTTbed above and ATP to produce nicks m 
mismatched DNA duplexes. The nicks introduced 
specifically in mismatched molecules by the methyl- 
directed mismatch repair allow for nick-translation DNA 
5 synthesis by the DNA polymerase, and thus allow labeling 
with radiolabeled or other labeled nucleotide 
triphosphates. A polymerase that lacks 3' to 5' 
exonuclease activity, but retains 5' to 3 ' exonuclease 
activity, e.g. Tag polymerase, is preferred for this step 
10 to avoid labeling by replacement synthesis at the ends of • 

the fragments . 

The described procedure will provide probes- that 
densely and specifically cover the regions of the genetic 
map that are heterozygous in the test individual. 

15 Conversely, the regions that are not heterozygous will not 
provide labeled probes and so should be recognized as 
distinctive gaps in the hybridization signal. In general, 
except where the coefficient of inbreeding is very low, 
the regions of homozygosity will be in contiguous patches 

20 of sufficient size, that even a few, e.g. 3 -10 -fold 

difference in hybridization intensity between homozygous 
and heterozygous sites should produce discernible 
boundaries. Thus, as indicated previously, the separation 
of heterozygous from homozygous sequences need not be 

25 absolute. In addition to clinical and mapping 

applications, this procedure is likely to be useful in 
plant and animal breeding, since backcrossing and 
selection can be used to isolate a gene responsible for a 
trait of interest to a small region of heterozygosity or 

3 0 homozygoa ity . 

For convenience, kits may be supplied which provide 
the necessary reagents in a convenient form and together. 
For example, for the genomic mismatch screening, kits 
could be provided which would include at least two of the 

35 following: The restriction enzymes: one or more which 

provide for average fragments from the target genome of a 
size in the range of about 0.5 - 10 x 10*; one or more 
modification enzymes; and restriction enzymes which 
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distinguish oetween hemimethylation and unmethylated or 
dimethylated consensus sequences; enzymes capable of 
introducing nicks at mismatches and expanding the nick to 
a gap of many (a 10) nucleotides; DNA polymerase; BNDC 
5 cellulose; and labeled triphosphates or labeled linkers 
for blunt end ligation or other composition for labeling 
the sequences to provide probes . Other components such as 
a physically ordered array of immobilized DNA genomic 
clones or metaphase chromosomes, automated systems for 
10 determining and interpreting the hybridization results, 

software for analyzing the data, or other aids may also be 
included depending upon the particular protocol which is 
to be employed. 

The subject methodology may find particular 
15 application in mapping genes by use of affected relative 
pairs- -that is, pairs of relatives that have a genetically 
. influenced trait of interest. "Affected relative pair" 
methods are preferred, particularly when the penetrance of 
the allele that confers the trait is low or age - dependent , 
2 0 or when the trait is multigenic or quantitative, e.g. 
height and build. Disease-susceptibility genes are 
particularly relevant. By determining where on the 
genetic map a small set, including two, of "affected" 
relatives have inherited identical sequences from a common 

2 5 source, and disregarding other family members, a highly 

efficient strategy for extracting linkage information from 
a pedigree is provided. The resulting identity-by-descent 
maps from multiple pairs of similarly-affected relatives 
can be combined and the composite map searched for loci 

3 0 where genot^^^ic concordance between affected relatives 

occurs more frequently than would be expected by chance. 
With a sufficiently large number of affected relative 
pairs, such an analysis can reveal the positions of genes 
that contribute even a slight susceptibility to the trait. 
3 5 The procedure may also find wide application in routine 
screening for shared genetic risks in families. 

The following examples are offered by way of 
illustration and not by way of limitation. 
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P.YPBPTMENTAL. 

Saccharomyces cereviseae. Baker's yeast, was used as 
a test system, because this is genetically the best 
characterised eukaryotic organism, and it is easy to 
5 prepare and characterize yeast clones of defined genetic 
relatedness . 

Tf-^t. of thp- method; Two independently isolated 
haploid clones of Saccharomyces, Y55 (HO his4 leu2 CANl^ 
ura3 GAL2) and Y24 (ho HIS4 LEU2 MATa canl^ URA3 GAL2) , 
10 both derivatives of common lab strains, were used for the- 
experiment. We estimate from RFLP analysis that Y55 and 
Y24 (a derivative of S288C) differ at approximately one 
base pair per 100. The two strains were mated, and the 
resulting diploid hybrid was sporulated, yielding 4 
15 haploid spore clones. For any given gentic locus, each 
spore clone ("daughter") received either the Y55 or the 
y24 allele. The purpose of the test was to determine if 
we could specifically isolate, en masse, DNA from all the 
loci at which two individuals (here, pairs of parent and 
20 daughter clones) share genetic identity, excluding DNA 
from all regions where there was no identity by descent 
between the pair. We applied our genome mismatch scanning 
method to determine for several loci, which spore clones 
had identity by descent with each parent. The results of 
25 the genome mismatch scanning analysis were compared with 
results from conventional analysis (Table 1) , using 
auxotrophic and drug resistance markers. Conventional 
analysis consisted of testing for growth on appropriate 
selective media. Four loci were tested-HIS4, CANl, URA3 , 

raTJl and URA3 on 
chromosome 5, and GAL2 is on chromosome 12. Our analysis 
of these loci included a total of 15 independent PstI 
restriction fragments, each of which constituted an 
independent test of the genomic mismatch scanning method, 
3 5 as their sometimes adjacent location in the genome was 
immaterial to their behaviour in the selection. The 
result of the test was that all 15 PstI restriction 
fragments analyzed were recovered if and only if they were 
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identical b^^escent between the parent and daughter being 
compared (Tables 1 and 2) . This result confirms the 
principles underlying genomic mismatch scanning. 
Procedure : 

5 1 . DNA Isolation: High molecular weight DNA was 

isolated from each yeast strain (parent or spore clone) by 
a standard method ( Methods in Enzvmoloav 194 
(chapter 11) :169-182) . 

2 . Initial Restriction Enzyme Digestion: Each DNA 
10 sample was digested completely with Pst.I restriction 

enzyme. The DNA was recovered by phenol : chloroform 
extraction, and ethanol precipitation, and resuspended in 
Tris-HCl 10 mM EDTA 1 mM, pH 8.0. 

3 . Methvlation of DNA From Parental Strains with 

15 Dam Methvlase: DNA samples from the two parental strains, 
Y55 and Y24, were fully methylated with E^ coli Dam 
methylase (New England Biolabs) , at a DNA concentration of 
0.25 mg/ml, using 4 units of enzyme/^g of DNA in an 
overnight incubation at 37°, in the buffer recommended by 

2 0 the manufacturer. The samples were extracted with 

phenol : chloroform, ethanol precipitated, and resuspended 
in Tris-HCl 10 mM, EDTA 1 mM, pH 8.0. 

4 . Mixing and Solution Hybridization of Paired Test 
DNA Samples : 

25 A. Y55 DNA (5 /xg in 45 + spore clone lb DNA 

(5 in 8 0 ^as denatured by adding 7.5 /zl of 5M NaOH. 

After 10 min at room temperature, the sample was 
neutralized by adding 16 /il of 3 M MOPS acid. 32 /il of 
formamide and 200 ;xl of 2X PERT buffer (4M NaSCN 20 mM 

3 0 Tris-HCl pH 7.9. 0 . 2 mM EDTA) were then added, and the 

sample was adjusted to 400 fil with water. 90% phenol in 
water was added until an emulsion was apparent (about 
8 0 /xl) / and then the sample was agitated to maintain the 
emulsion for 12 hours at room temperature (typically about 
35 23°) . 

B. Y55 DNA {5 ^JLg in AS fil) + spore clone Ic DNA 
(5 ^g in 45 Ml ) denatured by adding 5.4 ^1 of 5M NaOH. 

After 10 min at room temperature, the sample was 
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neutralized by adding 12 fil- of 3 M MOPS acid. 32 ^1 of 
formamide and 200 /zl of 2X PERT buffer were then added, 
and the sample was adjusted to 400 /xl with water. 90% 
phenol in water was added until an emulsion was apparent 
5 (about 150 ^1) , and then the sample was agitated to 
maintain the emulsion for 12 hours at room temperature 
(typically about 23°) . 

C, Y24 DNA (5 fig in 45 ijlI) + spore clone la DNA 

(5 fig in 45 fil) was denatured by adding 5.4 /il of 5M NaOH. 

10 After 10 min at room temperature^ the sample was 

neutralized by adding 12 |£l of 3 M MOPS acid. 3 2 fxl of 
formamide and 20 0 /xl of 2X PERT buffer were then added, 
and the sample was adjusted to 400 fil with water. 90% 
phenol in water was added until an emulsion was apparent 

15 (about 150 ^1) , and then the sample was agitated to 

maintain the emulsion for 12 hours at room temperature 
, (typically about 23°) . 

D. Y24 DNA (5 fig in 45 fil) + spore clone Ic DNA 

(5 /ig in 45 fil) was denatured by adding 5.4 m1 Pf 5M NaOH. 

20 After 10 min at room temperature, the sample was 

neutralized by adding 12 fil of 3 M MOPS acid. 32 ^1 of 
formamide and 200 fxl of 2X PERT buffer were then added, 
and the sample was adjusted to 400 /il with water. 90% 
phenol in water was added until an emulsion was apparent 

25 (about 150 fil) , and then the sample was agitated to 

maintain the emulsion for 12 hours at room temperature 
(typically about 23*=*) . 

To recover DNA, the samples were each extracted once 
with chloroform, then ethanol precipitated, and 

t rs 3-^ J ^ nr\r\ r,T ^-P ^-^A /XTVyTTs. 

5 . Digestion of Homohybrid Molecules (B oth Strands 
From the Same Source) with DpnI-i- and MboI+: 105 fil of 
each of the homohybrid strands was digested at 37° for 2 
hours in a final volume of 40 0 fil of NEB buffer 3 with 10 0 
3 5 units of Dpnl and and 25 units of Mbol. 

G, Removal of Residual Unannealed DNA: After 
Mbol/Dpnl digestion, samples were extracted with 
phenol/chlorofoirm, 10 0 fil of 5M NaCl was added to each and 
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then samples^ere incubated- with 100 mg of BNDC cellulose 
(Sigma) equilibrated with 50 /iM tris, pH 8 . 0 , IM NaCl , at 
4° for 3 hours. The sample was centrifuged at 14000 rpm 
in a microfuge, then the supernatant was extracted twice 
5 with phenol /chloroform and once with choroform, then 

ethanol precipitated, washed with 70% ethanol , dried and 
resuspended in 90 fil of Tris-HCl 10 mM EDTA 1 mM, pH 8.0. 

7 . Selective nicking of mismatched hybrid DNA^ s : 
15 fil of each DNA sample was mixed with 5.2 ng of MutH 

10 protein, 340 ng of MutL protein, 700 ng of MutS protein 
(all proteins provided in purified form by Paul Modrich, 
Duke University) , in a final volume of 6 0 /xl of a buffer 
consisting of: 50 mM Hepes (pH 8.0), 20 mM KCl , 5 mM 
MgCl2, 1 mM DTT, 50 ^ig/ml bovine serum albumin, and 2 mM 

15 ATP. The mixture was incubated at 37° for 3 0 minutes, and 
the reaction was then stopped by heating to 65° for 10 
. minutes . 

8 . Exonuclease III Digestion to Convert Nicks into 
Sinale-Strand Gaps, and Ends from Mbol or Dpnl Cleavage 

2 0 into Single-Strand Tails: The volume of the entire sample 

from step 7 was adjusted to 2 00 m1 by adding 14 0 /xl of a 
buffer consisting of: 50 mM Tris-HCl (pH 8.0), 5 mM 
MgCl2/ 10 mM ^-mercaptoethanol . Then 10 units of 
exonuclease III were added and incubation continued for 10 
25 min at 37°. This reaction was stopped by adding EDTA to 
10 mM, followed by extraction with phenol /chloroform. 

9 . Removal of Partially or Fully Single -Stranded 
DNA Molecules from the Mixture: 50 ^1 of 5M NaCl + 250 ^1 
of IM NaCl were added to adjust to a volume of 500 ^^1 at a 

3 0 concentration of 1 M NaCl. 100 mg of BNDC cellulose 

equilibrated with 50 mM tris, pH 8.0, IM NaCl, was added 
and the mixture incubated at 4° for 3 hours. (Sedert, 
et ai. (1967) J. Mol . Biol. 26:537-540; Iyer and Rupp 
(1971) Bioch. Biophvs. Acta. 228:117-126) The mixture was 
3 5 centrifuged at 14000 rpm for 1 min, then the supernatant 
was extracted once with phenol/chloroform, and ethanol 
precipitated overnight. The small pellets were 
resuspended in 15 ^1 of Tris-HCl 10 mM, EDTA 1 mM, pH 8.0. 
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10 . Analysis of the Selected DNA pool bv Southern 
Blotting: One-sixth of each of the resulting DNA samples 
were electrophoresed through a 0,7% agarose gel, in TBE 
buffer for 15 hours at 70 volts. DNA was transferred to a 
5 nylon filter by Southern blotting, and the filter was 
probed successively with labelled DNA from lambda phage 
clones corresponding to the 4 specific genetic loci, HIS4, 
CANl, URA3 AND GAL2 . In each case, 3-5 restriction 
fragments were readily detected in the lanes corresponding 
10 to DNA samples that had identity by descent at the test 
loci/ and not in the lanes corresponding to samples that 
were known from direct tests not to match at the locus 
being probed (see Table 1 and 2) . 



15 Table 1. 





Locus 


strain 




CANl 


URA3 


HIS4 


GAL2 


parents 


Y2A 


A 


A 


A 


A 




Y55 


B 


B 


B 


B 














daughters 


la 


B 


B 


A 


A 




lb 


A 


B 


A 


A 




ic 


A 


A 


B 


B 



The two alleles at each locus are designated A and B, 
20 respectively for the alleles present in Y24 and Y55, Each 
spore clone inherits an allele from one of its two 
parents, either the A allele from Y24 or the B allele from 
Y55. The alleles at these loci can be distinguished 
directly by testing for growth in specific media. 
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Table 2, Summary of the results of the Genome Mismatch 
Scanning test . 







LOCUS 


Comparison # 


Relative Pair 


CANl 


URA3 


HIS4 


GAL2 


1 


Y24 /daughter la 










2 


YSB /daughter Ic 








+ 


3 


y2 4 /daughter ic 


+ 








4 


YS 5 /daughter lb 






















number of restriction fragments 


[4] 


[4] 


[3] 


[5] 



10 indicates no DNA was recovered for any of the 

restriction fragment bands detected by the DNA probe 
specific for the indicated locus (neglecting faint bands 
from cross-hybridization to unlinked sequences) . 

indicates recovery of DNA in all restriction bands 
15 detected by the DNA probe specific to the indicated locus. 

The number of restriction fragment bands surveyed by the 
probe used for the indicated locus is indicated in 
brackets in the bottom row of the table. The probes used 
in each case were bacteriophage lambda clones from the 

2 0 ordered collection established by Maynard Olsen. The 
specific clones used as probes were: CANl: clone 5917, 
HIS4 : clone 4711, URA3 : clone 6150, GAL2 : clone 6637. The 
clone numbers are the numbers assigned by Maynard Olsen. 
For convenience, only 4 of the eight possible parent 

25 daughter combinations were tested. The results of the 
genetic tests for 4 loci are shown in Table 1. Numerous 
other pairwise comparisons have subsequently been tested 
with similar results. 



3 0 It is evident from the above results, that the 

subject methodology provides for numerous advantages. The 
methods provide access to a large set of highly 
polymorphic markers required for linkage mapping with 
small family units. A great increase in the effective 

3 5 number of informative markers is achieved without a 

corresponding increase in the number of individual tests, 
since all the markers are screened in parallel in a single 
procedure. By allowing much smaller sets of related 
individuals to be used for linkage mapping, the affected- 

4 0 relative-pair and homozygosity-by-descent mapping methods 
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^ ice the cost- and labor involved m 

developing the human genetic map. Genomic mismatch 
scanning allows for the practical application of linkage 
mapping to genetically heterogeneous or quantitative 
5 traits, such as cardiovascular disease, asthma, 

psychiatric disorders, epilepsy, obesity, cancer and 
diabetes . 

The subject methodology does not rely on any 
previously-mapped genetic markers. Thus, one can use the 
10 subject methodology to begin immediately to develop the ■ 
genetic and physical maps of a genome for which little or 
no prior map information is available. This can find 
particularly important application in the breeding of 
plant or animal species, as well as in development of the 
15 genetics of such species . 

Each pair-wise analysis allows sites of meiotic 
recombination to be mapped. In grandparent -grandchild 
pairs, identity-by- descent maps specifically identify the 
sites of meiotic recombination in the corresponding 
20 parent. Questions such as the relationship between the 
genetic and physical map, locations of sites of enhanced 
or diminished recombination, effects of age, sex and other 
factors on the frequency and distribution of meiotic 
recombination events, and the relationship between 
25 recombination and non-dysjxinction can be readily 
investigated in this way. 

Finally, the ability to detect directly regions of 
the genome that have lost heterozygosity may be useful in 
identifying putative tumor -suppressor genes and in the 

of 

JU tiiiiXXtSJ. vaj-ciyAiv-iis J-wj wj- i.>^-.— , 

heterozygosity at specific loci appears to be an important 
genetic event in the development of many cancers. 

All publications and patent applications cited in 
this specification are herein incorporated by reference as 
35 if each individual ptiblication or patent application were 
specifically and individually indicated to be incorporated 
by reference . 
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Althoug^^he foregoing invention has been described 
in some detail by way of illustration and example for 
purposes of clarity of understanding, it will be readily 
apparent to those of ordinary skill in the art in light of 
the teachings of this invention that certain changes and 
modifications may be made thereto without departing from 
the spirit or scope of the appended claims. 
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WHAT 15 CLAIMED IS 



1. A method for separating DNA duplexes capable of 
being used for genetic mapping or identification, from a 
complex mixture of DNA from two related sources, wherein 

5 each of said sources contributes at least about 5 x 10^ bp 
of DNA, said method comprising: 

digesting the DNA from first and second related 
sources to provide restriction fragments, wherein said DNA 
from said first and second related sources is 
10 distinguishable as a result of differential modification 
of said DNA or use of different restriction enzymes in 
said digesting, to provide DNA duplexes consisting of 
homohybrids and heterohybrids ; 

separating homohybrids from heterohybrids; 
15 introducing lesions in heterohybrids having 

. mismatches; and 

isolating heterohybrids having lesions from perfectly 
complementary heterohybrids. 

2. A method according to Claim 1, wherein said 
different modification and separating is by means of: 

methylating the DNA from one of said related sources 
or methylating the DNA from both of said sources with 
different methylases; 

cleaving said DNA duplexes with a methyl sensitive 
restriction enzyme resulting in cleaving of homohybrid 
DNA; and 

segregating heterohybrid DNA from cleaved homohybrid 

DNA; 

and said introducing lesions is by means of: 
bringing together said heterohybrid DNA with enzymes 
of the methyl -directed mismatch repair pathway and an 
exonuclease, whereby nicks are introduced into said 
heterohybrid DNA comprising a lesion and said lesion is 
extended into a gap by said exonuclease to provide 
partially single stranded and single stranded DNA; 
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dividin^^aid partially single-stranded and single 
stranded DNA from said perfectly complementary DNA 
duplexes ; and 

labeling said perfectly complementary DNA duplexes 
5 for use in genetic mapping or identification. 

3. A method according to Claim 2, wherein said 
enzymes comprise MutL, MutS, and MutH of E. coli. 

4. A method according to Claim 2, wherein included • 
in said combining step is helicase II, single-strand 

10 binding protein, and at least one of exonuclease - I and 
VII; or wherein said exonuclease is exonuclease III of 
E. coli, 

5. A method according to Claim 2, wherein said 
■ dividing comprises : 

15 combining said partially single-stranded and single 

stranded DNA and DNA duplexes with benzoylated 
naphthylated DEAE cellulose (BNDC) at high salt 
concentration; and 



20 perfectly complementary DNA duplexes. 

6 . A method for separating DNA duplexes capable of 
being used for genetic mapping or identification, from a 
complex mixture of DNA from two related genomes, wherein 



2 5 DNA, said method comprising: 

combining DNA restriction fragments from first and 
second related genomes, wherein said DNA substantially 
consists of restriction fragments comprising a GATC 
sequence, under melting and reannealing conditions to form 

3 0 homohybrid and heterohybrid DNA duplexes, wherein said DNA 

fragments from said first and second sources are 
different ; 

segregating homohybrid from heterohybrid DNA duplexes 
by means of the difference in said DNA duplexes; 



freeing the DNA bound to the BNDC cellulose from the 



each of said genomes contributes at least about 10^ bp of 
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bringing together heterohybrid DNA duplexes wxth 

enzymes consisting of MutL, MutS, and MutH of E. coli, 
resulting in lesions consisting of nicks in DNA duplexes 
that contain mismatches; 
5 treating the resulting mixture of nicked and unnicked 

DNA molecules with exonuclease III, such that nicked DNA 
molecules in the mixture are rendered partially single 
stranded; and 

separating said partially single stranded DNA from 
10 completely double stranded DNA. 

7 . A method according to Claim 6 , wherein said 
method comprises one of: 

including in said bringing together, or in a 
subsequent step, a DNA polymerase^ and labeled 
15 nucleotides, wherein said labeled nucleotides become 
. incorporated into said nicked DNA duplexes ; and 

separating said labeled DNA from unlabeled DNA by 
means of said label; or 

combining said partially single- stranded and single 

2 0 stranded DNA and completely double stranded DNA with 

benzoylated naphthylated DEAE cellulose at high salt 
concentration; and 

separating the DNA bound to the cellulose from the 
completely double stranded DNA; or 
25 cleaving partially single stranded DNA at the site of 

said single strand to provide small DNA duplexes; and 

separating said small DNA duplexes from uncleaved DNA 
duplexes . 

8. A method according to Claim 6, wherein said DNA 

3 0 restriction fragments are different in having different 

termini, wherein the difference in termini is used to 
separate heterohybrids from homohybrids . 
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9. A i^^hod according i:o Claim 6, wherein said 
bringing together further comprises : 

including with said heterohybrid DNA, helicase II, 
exonuclease I and/or VII, single strand binding protein, 
5 DNA polymerase III, and labeled nucleotides, wherein said 
labeled nucleotides become incorporated into partially 
single stranded DNA to provide labeled DNA; and 

separating said labeled DNA from unlabeled DNA by 
means of said label . 

10 10. A method for identifying nucleic acid areas of 

identity with a probe obtained by a separating method for 
separating DNA duplexes from a complex mixture of DNA from 
two related sources, wherein each of said sources 
contributes at least about 5 x 10^ bp of DNA, said method 

15 comprising: 

digesting the DNA from first and second related 
sources to provide restriction fragments, wherein said DNA 
from said first and second related sources is 
distinguishable as a result of differential modification 
20 of said DNA or use of different restriction enzymes in 
said digesting, to provide DNA duplexes consisting of 
homohybrids and heterohybrids ; 

separating homohybrids from heterohybrids; 
introducing lesions in heterohybrids having 

2 5 mismatches ; and 

isolating heterohybrids having lesions from perfectly 
complementary heterohybrids; and 

labeling said perfectly complementary heterohybrids; 
said methof^ rnmprising: 

3 0 combining said probe with an ordered array of DNA 

molecules representing at least a portion of a genetic map 
under conditions wherein said probe hybridizes to 
homologous DNA; and 

detecting the areas of identity by means of said 
3 5 label. 
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11. A method according to Claim 10, wherein said 
ordered array is a metaphase chromosome spread. 

12. A method according to Claim 10, wherein said 
ordered array is an array of clones representing at least 

5 a portion of said genetic map. 

13. A method according to Claim 10, wherein said 
ordered array is an array of DNA sequences amplified 

in vitro, representing at least a portion of said genetic - 
map . 

10 14 . A method for identifying DNA sequences which are 

heterozygous for the same locus in the genome of an 
individual, said method comprising: 

digesting genomic DNA with at least one restriction 
. enzyme to provide small DNA fragments of from about 2 00 to 
15 5000 bp; 

melting and reannealing said DNA fragments; 
combining said DNA fragments with enzymes of the 
methyl -directed mismatch repair pathway, whereby nicks are 
introduced into the mismatch- containing fragments of said 



labeling said nicked DNA fragments to provide probes; 

combining said probes with an ordered array of DNA 
molecules representing at least a portion of said genome 
xinder conditions wherein said probe hybridizes to 
25 homologous DNA; and 

identifying said DNA sequences by means of said 
label . 

15. A method according to Claim 14, wherein said 
label comprises incubating said lesion containing small 
3 0 DNA fragments with a polymerase which lacks 3' to 5' 

exonuclease activity and labeled nucleotide triphosphates . 
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DNA; 
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16. A nfffhod for identifying DNA sequences which are 
heterozygous for the same locus in the genome of an 
individual, said method comprising: 

digesting genomic DNA with at least one restriction 
5 enzyme to provide small DNA fragments of from about 20 0 to 
5 00 0 bp; 

melting and reannealing said DNA fragments; 

combining said DNA fragments with enzymes of the 
nethyl-directed mismatch repair pathway, whereby nicks are 
10 introduced into the mismatch containing fragments of said ■ 
DNA; 

treating said DNA fragments with an exonuclease such 
that those fragments with nicks are rendered single- 
stranded or partially single-stranded; 
15 separating single -stranded and partially single- 

stranded fragments from completely double -stranded DNA 
.fragments by adsorbtion to BNDC; 

labeling said single stranded or partially single- 
stranded small DNA fragments or labeling perfectly 
20 complementary dsDNA to provide probes; 

combining said probes with an ordered array of DNA 
molecules representing at least a portion of said genome 
under conditions wherein said probe hybridizes to 
homologous DNA; and 
25 identifying said DNA sequences by means of said 

label . 



17. A method according to Claim 16, wherein said 
labeling of said lesion containing partially single 
stranded DNA r:m7ip"ric!es adding DNA polymerase III and 

3 0 nucleotide triphosphates including a label to said 
partially single stranded DNA. 

18. A method of genetic mapping comprising: 
combining fragmented first genomic DNA from two 

different sources under conditions wherein said fragments 
3 5 can anneal together to form heterohybrid dsDNA duplexes 
and homohybrid dsDNA duplexes, wherein the size of the 
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fragments is selected to substantially ensure that 

heterohybrids formed from genetic regions which are not 
identical by descent will contain at least one base 
mismatch and to ensure the ability to separate perfectly 
5 matched duplexes from mismatched duplexes; 

separating heterohybrid duplexes from homohybrid 
duplexes and matched duplexes from mismatched duplexes by 
preferentially modifying heterohybrid duplexes and 
mismatched homohybrid duplexes; 
10 labeling at least one of. the strands of said matched * 

heterohybrid duplexes to provide detectable probes without 
expansion of said matched heterohybrid duplexes; and 

combining second genomic DNA with said detectable 
probes to detect and/or map sites of matched DNA of said 
15 sources . 



19. A method according to Claim 18, wherein said 
method further comprises: 

methyiating the DNA from at least one of said sources 
at specific consensus sequences with a different enzyme 
20 for the DNA from each source; and 

said separating comprises ; 

cleaving said duplexes with at least one restriction 
endonuclease, wherein duplexes which are unmethylated or 
doubly methylated are cleaved to provide smaller 
25 fragments; 

isolating the uncleaved duplexes free of said cleaved 

duplexes ; 

nicking mismatched duplexes and introducing gaps in 
said nicked mismatched duplexes while leaving matched 
3 0 duplexes unchanged; and 

segregating gapped duplexes from matched duplexes . 



20. A method according to Claim 19, wherein said 
nicking and introducing gaps employs the proteins of a 
methyl directed mismatch repair pathway. 
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21. A TTOthod according to Claim 19, wherein said 
nicking and introducing gaps employs the proteins of the 
methyl -directed mismatch repair pathway and exonuclease 
III . 



segregating comprises combining said duplexes with BNDC 
cellulose and separating DNA bound to said BNDC cellulose 
from unbound duplexes . 

23. A method according to Claim 18, wherein said 
10 second genomic DNA is an ordered array of DNA molecules 

representing at least a portion of a genetic map. 

24. A kit comprising MutL, MutS, MutH enzymes, 
labeled triphosphates or labeled linkers, and a DNA 

. polymerase . 

15 25. A kit according to Claim 24, further comprising 

at least one of the following: a modification enzyme, and 
an enzyme that cleaves at other than hemimethylated sites 
recognized by said modification enzyme; an exonuclease; 
helicase II; single-strand binding protein; BNDC 

2 0 cellulose; an ordered array of DNA molecules comprising at 
least a portion of a genetic map. 



5 



22 . 



A method according to Claim 19, wherein said 
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